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Abstract 

In this paper it is shown that given a sufficient number of (noisy) random binary linear equations, 
the Learning from Parity with Noise (LPN) problem can be solved in essentially cube root time in 
the number of unknowns. The techniques used to recover the solution are known from fast correlation 
attacks on stream ciphers. As in fast correlation attacks, the performance of the algorithm depends on 
the number of equations given. It is shown that if this number exceeds a certain bound, and the bias of 
the noisy equations is polynomial in number of unknowns n, the running time of the algorithm is reduced 
to 2"3 +o( '"' compared to the brute force checking of all 2™ possible solutions. The mentioned bound is 
explicitly given and it is further shown that when this bound is exceeded, the complexity of the approach 
can even be further reduced. 
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1 Introduction 

In many cryptanalyses, especially in fast correlation attacks on stream ciphers, some information on the 
secret key is leaked in form of a set of linear binary equations which are satisfied with probability bigger 
than one half. For each of these equations, let q = \ + e be the probability that the secret key is in the 
solution set. We call e the bias. It is clear that if e = |, every equation essentially halfes the number of 
possible solutions, as long as it is independent from the previous ones. In particular if the system of equation 
has full rank, the key can be recovered in polynomial time by simple Gaussian Elimination. An interesting 
problem lies in how to recover the key if e < |. The LPN problem (e.g. [5], [7]) captures the essence of this 
task. Let 

• x € F2 be a n-dimensional binary vector, also referred to as the key in the sequel. 

• E ~ Ber(p) be a random variable with Pr(E = 1) = p = | — e and Pr(_E = 0) = q — \ + e, e e [0, 5] . 

• £ be an oracle that uniformly at random chooses g 6 F£ and outputs pairs {{g,x) + e,g) where e is 
drawn according to E and (•, •) denotes the usual inner product. The g's can be seen as binary linear 
equations and computing the scalar product with x corresponds to evaluating them at x. 

The n-dimensional LPN e problem can be stated as follows: Given e and e, recover x. A lower bound on the 
number N of oracle calls necessary in order to be able to identify the correct x with non-negligible probability 
can be given. This bound corresponds to the number of samples N necessary in order to make a good guess 
whether a random variable X is distributed according to Be(p) or X ~ Be(i), where Be denotes the usual 
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bernoulli distribution. It is common knowledge that this number satisfies N = O (3-), as can easily be seen 
by Hoeffding's inequality [2]. As a consequence x can be recovered in time 0(2 n log 4y) making N — O (4j) 
oracle calls. This is achieved by evaluating all equations at 2™ points, using the techniques of fast Walsh 
transform j3J. While the LPN problem is proven to be NP-hard [I], in the case where N ^> \ faster 
approaches than brute-force checking of all potential keys are possible. Especially the techniques known 
from fast correlation attacks (see e.g. [3J, [2], [5]) are well applicable. The core of most techniques lies 
in finding linear combinations of the given equations such that a hypothesis on a subset of keybits can be 
tested. The application of these techniques to the LPN problem has been studied already in e.g. [3J [5] [7]. 
While the attack we consider is not different to e.g. the one in [5], the approach to the problem is another. 
From past work, e.g. [2 [3j [5j |4j [7] , it is not immediately clear how the complexity behaves depending on 
the number TV of random linear equations and the bias e. The influence of N and e becomes explicit in our 
considerations. We will show that if e = pol y( n ) , then the LPN problem can be solved in time 2^ +0 ^ given 

N = 2^ +o(n) equations. 

The paper is organized as follows. In Section [2] a short overview on the fast correlation attack techniques 
is given. In Section [3J it is shown how the complexity to recover the secret key depends on the number N of 
oracle calls and the main result is stated at the end of the section. Section IOI contains the case where the 
number N of given equations exceeds the bound sufficient for a cube root attack. In Section [4] an illustrating 
example is given. Throughout the paper, log will denote the logarithm to base 2. 



2 Linear Combination and Hypothesis Testing 

Most fast correlation attacks rely on the principles of linear combination and hypothesis testing. The goal of 
linear combination lies in constructing binary linear equations that depend only on a subset of the keybits. 
These equations can then be used to test a hypothesis on this subset of keybits. Let g[ = {(gi, x) + ej, gi) € 
F^ +1 be a sample output by the oracle e . Note that if we add w random samples g[ , . . . , g\ from e , i.e. 

if we consider g = HT\ (g h , x ) + £\ e i} , £\ g h J = ({J2j 9y 1 x ) + Ej e h > Ej 9i 3 ) tni s looks like a sample 
output from 0g, with e = 2 w ~ 1 e w . This can be seen by the well known Piling-up lemma (e.g. [10 ). By 
appropriately choosing ^-tuples of samples from C , we can get equations from 0j which depend only on a 
subset of keybits. 

Lemma 1. Let w € N be even and N 3> w be the number of samples given from e . Then all w-ary linear 
combinations of these equations which are all zero in the last b bits can be found in time and space 

ofmax(iVf,^lV (1) 



(2) 



2 b , 

Proof. The number of all -j-ary linear combinations of the given equations equals 

N 



w/2 



for fixed w. Compute these linear combinations and store the resulting equations in blocks according to the 
last b bits, i.e. inside a block the new equations coincide on the last b bits. In each of the 2 b blocks there 
are an expected number of 

(w/2) 
2 b 

equations. Inside each block, take all 2-ary combinations which every time gives an expected number of 
equations of the desired form. As there are 2 b blocks, we get an expected number 



2» I - 6 (3) 
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equations of the desired form. The complexity of the whole is the sum of the complexities for getting all the 
^-ary linear combinations © and and all the 2-ary combinations §3§ inside the 2 b blocks. □ 

Clearly the 4^- equations found as in Lemma [T] depend on the first n — b keybits only. A hypothesis on 
these bits can be tested if 

— > — - 1 

2b — e /2 — 2 2 ( w - 1 )e 2w ' 

as discussed in Section [TJ Note that we implicitly assume that the new equations are pairwise independent, 
what seems to be a admissible assumption [5]. In order to find the correct n—b keybits, all possible hypotheses 
on these bits are checked. This can be done by techniques of the fast Walsh transform [3]. 

Lemma 2. Evaluating 22(m _ 1 1)e2m binary linear equations in n — b variables can be done in time 

2 "~ bl °g 22 (Ji) e2M , - (5) 

In the next section we derive the optimal choice of the parameters w and 6, and we will see how the 
resulting complexity behaves depending on N. 

3 Cube-root algorithm 

Suppose we are given N > samples from e . In Section [2] we have seen that 
• if for w, b €E N it holds that w is even and 

N w 1 

2b — 22(™-l) e 2w ' ( ' 

then we can recover the first n — b keybits in time 

O (max , ^, 2- & log ^ }) . (7) 

In this section we will show how to find optimal parameters b and w such that the expression in (0 is 
minimal under the condition that the inequality (0 is satisfied. Clearly ^ is equivalent to 

w(logiV + 2 + 21oge) > 6 + 2, 

by taking the logarithm on both sides. We will now show that in order to reach minimal complexity in ([7]) 
this inequality must be satisfied with equality. Note that the right hand side of the inequality is increasing 
with b and as N > -y, the left hand side is increasing with w. Suppose that for a given choice of b and w 
the inequality is strict. Then either b can be increased or w can be decreased resulting in a decrease of the 
overall complexity ([7]), while the inequality still holds. So we can require equality 

w(log7V + 2 + 21oge)-2 = 6. (8) 

Using this in equation ((Jj, we get the following overall complexity 

ofmaxiA^,— - — 1 - ,2"—— — -, — \. „ log- 



22(«i-l) £ 2tu ' ]\JW 22(l«-l) e 2tl) 6 22(t«~l) e 2ij) 

In order to ease discussion we adjust the condition on N. From now on we will assume that 
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N > 



(2e)*' 

As a direct consequence 



2 2(tu-l) e 2«j ' 
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and the overall complexity equals 



O 



max { NJ^, 2 r 

a(w) 



1 



1 



log; 



1 



V 



jyw 22(tu-l) e 2u) to 22(tu-l) £ 2iD 
0(w) 



J 



One readily verifies that a(w) is growing with w and /3(w) is decreasing with w. Hence the whole term reaches 
its minimum at the intersection of the two functions, i.e. if a(w) = (3(w). In order to get an (approximate) 
solution for the equation a(w) = f3{w), we ignore the logarithmic term in fi{w) and obtain: 

„ i <■> 

(9) 



Using (O, for b we obtain: 

b = w (log 7V + 2 + 21oge)-2 = n- 



3/21ogjV + 2 + 21oge' 

(n + 2) log TV 



31ogiV + 4 + 41oge " 2 l ° gN ' 



(10) 



We will now examine how this choice of the parameters affects the complexity of the linear combination and 
hypothesis testing approach. For simplicity in the further analysis let us define 



T e (N) := 



log TV 



So we can write 



and 



3 log N + 4 + 4 log e 
2(n + 2), 



logiV 



-T e (N), 



b = w (log N + 2 + 21oge) - 2 = n - (n + 2)T e (JV). 



(11) 



(12) 



(13) 



Lemma 3. Notation as in the considerations before. Making N > j^yc oracle calls and writing r := 
w — 2 | 3! 2 i J an d T '■— T e (N), the n- dimensional LPN e problem can be solved in time and space 

q ^ 2 (n+2)T+|r|logiV+log((n+2)T+logJV)^ _ 



Proof. Let w be as in ©. Dehne 



w 1 := 2 



w + 1 



and b' := [w' (log JV + 2 + 2 log e) - 2J 



(15) 



This definition ensures that w' and b' are integers and w' = w + r is even with r G [—1,1]. Further 



N v 



2 b> 



> 



N v 



1 



2b 2 2 ( w ~ l )e 2wl 



so we have enough equations to check a hypothesis on the n — b' nonzero bits. The complexity for finding 
the w'-avy linear equations equals 

= N^ + ^ = 2("+ 2 ) T +i 1 °s A '. 
Let us now examine the complexity for evaluating these equations at 2™~ h points. We have 

b' = [u/(logiV + 2 + 21oge)-2j 

= [(w + r) (logAT + 2 + 21oge) - 2j 

= [ui(logiV + 2 + 21oge) -2 + r(logA^ + 2 + 21oge)J 

> w(logiV + 2 + 21oge) - 2 + r(logiV + 2 + 21oge) - 1/2 



ecu 



n- (n + 2)T + r (log iV + 2 + 2 log e) - 1/2. 
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Hence 

AT W ' i 
2"- 6 ' log , < 2(™+ 2 ) T - r ( lo g N+2+2 log e)+l/2 j jy^- 

<(n+2)T+ilogAT 

Adding these two upper bounds, we obtain the overall complexity 

iV 



2 n-b' 



l og J _L_ +N ^ < 2 ( " +2)T ^ r(losAr+2+21ose)+ ^ +los((Tl+2)T+ 5 logAr) + 2* logAr ) (16) 
= o ^2 (n+2)T+rlosAr+los ( ( ™ +2)T+ 5 logAr )^ (17) 

□ 

Corollary 1. Using the notation from the previous lemma. Making N > ^ 9 ^ 4 oracle calls the n- dimensional 
LPN e problem can be solved in time and space 

Q ^2(™+ 2 ) T + lo S A r +log((n+2)T+logAT)~j 

Proof. Immediately as |r| = \w - 2 | < 1. □ 

It is not hard to see that if log N is large, T c (TV) will converge to | . Clearly 
T (N\= l0gN = 1 (\ 4 + 41oge 

£ V / O 1„„ AT I A I A 1„„. , O I 



31og7V + 4 + 41oge 3 \ 31ogiV + 4 + 41oge / 
Recall that loge < —1 and since N > ^fyr we have that log N > —2 — 41oge. Consequently 

1 / 4 + 41oge\ 1 / 1\ 1 

TJN) < - 1 + — - — < - 1 + - = -. 

y ' 3 V 2 + 81ogey 3 V 2/ 2 

In the case where N is significantly bigger, particularly if 

4 
(27 



N > - — -r2^ > 2^ _ 3-3 lo g e 



T e (N)<l(l-l il + l0ge)l ° Sn )- (18) 



one readily verifies that 

TJN\ < 

3 \ 3 n 
We can prove the following lemma: 

Lemma 4. If e = pol y(- n ) we can solve the n-dimensional LPN t in time and space 

2% +o(n) 

making N > 2'°g" oracle calls. 

Proof. First notice that with e = po iy( n ) , we have that log ^ = o(n). We will use only a subset of N' = 
2T ^ (27F of the S iven equations. Write T' := T e (N'). Then 

x , n + 2 / 41ogn(loge+ 1)\ 1 2 4(n + 2V 
(n + 2)2* = — L±J i J = - n + ---L_J (loge + 1)lo8n . 

=o(n) 

Further 

77 4 

So from Lemma[3]and as also log ((n + 2)T" + log TV') = o(n), we get that we can find the solution in time 

q ^2(™+2)T'+logAf'+log((n+2)T'+logAf')~j _ 2§+o(n) 

□ 
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We have seen that we do not need more than N = 2 l °% n+ °^ equations to solve the LPN problem in 
essentially cube-root time. As seen in the proof of Lemma 4, given N 3> 2 lo s™ the approach makes use 
of N' = 2 lo <5" j^y[ of the given equations. The resulting overhead can be exploited to further reduce the 

complexity. The principle used in the case where TV 3> 2 lo s™ is called decimation [S]. Given 

N = 2 l 2^~ 4 



(2e)4 

equations, the problem is basically reduced to solving the LPN problem in dimension n — I. 
3.1 Decimation 

We have seen that if N = 2 lo s ™ the complexity of the LPN problem is 2? + °( n '. If we are given 

./V 3> 2'°«" equations, simple decimation allows to reduce the security parameter n of the problem. 
Suppose we are given 

N = 2^ +l 4 



(2e)*' 

equations with I < n — 2. We want to consider only the equations that do not depent on (e.g. the first) CeN 
bits of the key x. We have an expected number N2~ l of such equations. In order to be able to recover the 
remaining n — V keybits, the following equality must hold 



N2~ L = 2^ +l - l '—^-j > 2'°s("-n —2- 
(2e)4 " (2e) 



Equivalently, 



+ 1-1' > 



/' 



logn log(n — I') 

Setting V = [l\ < n — 3, this inequality is satisfied and we can reduce the problem parameter n to n — I. 

Lemma 5. If e = pol y( n ) we can solve the LPN t in time and space 

2^+o(n)^ 

making N > 2 lo ^ ri+l oracle calls. 

4 Example 

We have seen that the LPN problem can be solved in essentially cube-root time and space. Consider the 
classical setting of a fast correlation attack [9]. Suppose we have a stream cipher with keylength n = 128 
whose output bits correspond to linear combinations of the keybits transmitted over the binary symmetric 
channel with crossover probability i — e = | — | = 0.375. For a given number N > 2 10 (note that 
log -^7)4 = 10) of equations, we have seen how to in principle optimally choose w and b (see ([9]) and ([TO]) ). 
However these values are not necessarily in N and w is not necessarily even. So w is rounded to the nearest 
even number w' and b' is chosen accordingly (see (|15|) in the proof of Lemma . This gives an additional 
summand < \r\ \ogN in the exponent of the complexity (compare (|14jl ). Table 0] shows how this rounding 
problem influences the complexity. Decimation is not considered in this example. 
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logiV w b w' b' |r| log JV log Clc \ogCuT 



10 


11.82 


68.91 


12 


70 


1.8 


60 


63.64 


20 


5 


78 


6 


94 


20 


60 


38.70 


30 


3.17 


80.44 


4 


102 


24.8 


60 


30.17 


40 


2.32 


81.57 


2 


70 


12.8 


40 


61.32 


47 


1.95 


82.06 


2 


84 


2.1 


47 


47.32 


50 


1.83 


82.22 


2 


90 


8.5 


50 


41.32 



Table 1: Complexity of Linear Combination Clc and Hypothesis testing Cht depending on the number of 
equations. 
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